confidence profile
Improving Simple Models with Confidence Profiles
In this paper, we propose a new method called ProfWeight for transferring information from a pre-trained deep neural network that has a high test accuracy to a simpler interpretable model or a very shallow network of low complexity and a priori low test accuracy. We are motivated by applications in interpretability and model deployment in severely memory constrained environments (like sensors). Our method uses linear probes to generate confidence scores through flattened intermediate representations. Our transfer method involves a theoretically justified weighting of samples during the training of the simple model using confidence scores of these intermediate layers. The value of our method is first demonstrated on CIFAR-10, where our weighting method significantly improves (3-4\%) networks with only a fraction of the number of Resnet blocks of a complex Resnet model. We further demonstrate operationally significant results on a real manufacturing problem, where we dramatically increase the test accuracy of a CART model (the domain standard) by roughly $13\%$.
Reviews: Improving Simple Models with Confidence Profiles
The authors introduce ProfWeight - a method for transferring knowledge from a teacher model to a student model. A "confidence profile" (taken from classification layers placed throughout the network) is used to determine which training samples are easy and which are hard. The loss function for the student model is weighted to favor learning the easier samples. The authors test this method on CIFAR10 and a real-world dataset. Quality: The idea presented by this paper is interesting and well-motivated. The method and results could be presented with more clarity, and the paper could benefit from some additional empirical analysis.
Improving Simple Models with Confidence Profiles
Dhurandhar, Amit, Shanmugam, Karthikeyan, Luss, Ronny, Olsen, Peder A.
In this paper, we propose a new method called ProfWeight for transferring information from a pre-trained deep neural network that has a high test accuracy to a simpler interpretable model or a very shallow network of low complexity and a priori low test accuracy. We are motivated by applications in interpretability and model deployment in severely memory constrained environments (like sensors). Our method uses linear probes to generate confidence scores through flattened intermediate representations. Our transfer method involves a theoretically justified weighting of samples during the training of the simple model using confidence scores of these intermediate layers. The value of our method is first demonstrated on CIFAR-10, where our weighting method significantly improves (3-4\%) networks with only a fraction of the number of Resnet blocks of a complex Resnet model.
Interpretability and performance: Can the same model achieve both?
In our work, Improving Simple Models with Confidence Profiles, we try to bridge this gap by proposing a method to transfer information from a high-performing neural network to another model that the domain expert or the application may demand. For example, in computational biology and economics, sparse linear models are often preferred, while in complex instrumented domains such as semi-conductor manufacturing, the engineers might prefer using decision trees. Such simpler interpretable models can build trust with the expert and provide useful insight leading to discovery of novel and previously unknown facts. Our goal is pictorially depicted below, for a specific case in which we are trying to improve performance of a decision tree. The assumption is that our network is a high-performing teacher, and we can use some of its information to teach the simple, interpretable, but generally low-performing student model.